Double-ended prediction of the naturalness ratings of the blizzard challenge 2008-2013

نویسندگان

  • Lukas Latacz
  • Werner Verhelst
چکیده

In this paper we describe a double-ended (i.e. reference-based or intrusive) approach to objective quality estimation of synthetic speech that uses a linear regression model whose parameters can easily be interpreted. The model was trained and evaluated on English data from the 2008 to 2013 Blizzard Challenges (BC) [1], which is the largest publically available resource of listener-evaluated synthetic speech. To our knowledge, this is the first attempt to train and evaluate a speech quality predictor on the whole data set. Predicting the naturalness of the different participating systems in the BC is not an easy task because some of the systems are quite close in quality. Our best results correspond to a Pearson correlation coefficient of 0.60 and 0.84 for sentences and systems, respectively, using a leave-one-systemout evaluation, which by far outperformed the ITU-T standard PESQ [2] for double-ended speech quality evaluation on this data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Instrumental Quality Prediction Performance for the Blizzard Challenge

In this paper, the performance of the standard instrumental quality prediction algorithm ITU-T P.563 is reported based on the 2007 and 2008 Blizzard Challenge speech data. The algorithm, which is optimized for natural speech, is shown to obtain poor correlation with subjective quality ratings. In an attempt to improve instrumental quality prediction performance for the Blizzard Challenge, modif...

متن کامل

The Blizzard Challenge 2007

In Blizzard 2007, the third Blizzard Challenge, participants were asked to build voices from a dataset, a defined subset and, following certain constraints, a subset of their choice. A set of test sentences was then released to be synthesised. An online evaluation of the submitted synthesised sentences focused on naturalness and intelligibility, and added new sections for degree of similarity t...

متن کامل

The USTC System for Blizzard Challenge 2008

This paper introduces the speech synthesis system developed by USTC for Blizzard Challenge 2008. Two synthetic voices from the released UK English database are built using the HMMbased unit selection synthesis method, which is a hybrid of statistical parametric synthesis and unit-selection techniques. In this method, the optimal sequence of phone-sized candidate units is selected from the datab...

متن کامل

The USTC System for Blizzard Challenge 2009

This paper introduces the USTC’s speech synthesis system for Blizzard Challenge 2009. USTC attended all English tasks including the hub tasks and the spoke tasks. According to the various conditions for different tasks, different versions of HMM based unit-selection systems are constructed based on the USTC Blizzard Challenge 2008 system. Many new techniques are employed in our speech synthesis...

متن کامل

The HTS-2008 System: Yet Another Evaluation of the Speaker-Adaptive HMM-based Speech Synthesis System in The 2008 Blizzard Challenge

For the 2008 Blizzard Challenge, we used the same speakeradaptive approach to HMM-based speech synthesis that was used in the HTS entry to the 2007 challenge, but an improved system was built in which the multi-accented English average voice model was trained on 41 hours of speech data with highorder mel-cepstral analysis using an efficient forward-backward algorithm for the HSMM. The listener ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015